-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update P7 32-bit partial vector load cost #108261
Conversation
@llvm/pr-subscribers-llvm-analysis Author: None (RolandF77) ChangesUpdate cost model to reflect codegen change to use lfiwzx from #104507. Full diff: https://github.com/llvm/llvm-project/pull/108261.diff 2 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index b7bdbeb535d526..df0047022a2c04 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -802,12 +802,18 @@ InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
// explicitly check this case. There are also corresponding store
// instructions.
unsigned MemBytes = Src->getPrimitiveSizeInBits();
- if (ST->hasVSX() && IsAltivecType &&
- (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
- return 1;
+ Align AlignBytes = Alignment ? *Alignment : Align(1);
+ unsigned SrcBytes = LT.second.getStoreSize();
+ if (ST->hasVSX() && IsAltivecType) {
+ if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
+ return 1;
+ // Use lfiwax/xxspltw
+ if (Opcode == Instruction::Load && MemBytes == 32)
+ if (AlignBytes < SrcBytes || Cost > 2)
+ return 2;
+ }
// Aligned loads and stores are easy.
- unsigned SrcBytes = LT.second.getStoreSize();
if (!SrcBytes || !Alignment || *Alignment >= SrcBytes)
return Cost;
diff --git a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
index 54cafa0ae59f39..0e7e89c18c1cba 100644
--- a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
+++ b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
@@ -1,18 +1,21 @@
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck --check-prefix=P7 %s
target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"
define i32 @loads(i32 %arg) {
; CHECK: cost of 1 {{.*}} load
+ ; P7: cost of 2 {{.*}} load
load <4 x i8>, ptr undef, align 1
- ; CHECK: cost of 1 {{.*}} load
+ ; CHECK, P7: cost of 1 {{.*}} load
load <8 x i8>, ptr undef, align 1
; CHECK: cost of 1 {{.*}} load
+ ; P7: cost of 2 {{.*}} load
load <2 x i16>, ptr undef, align 2
- ; CHECK: cost of 1 {{.*}} load
+ ; CHECK, P7: cost of 1 {{.*}} load
load <4 x i16>, ptr undef, align 2
ret i32 undef
|
@llvm/pr-subscribers-backend-powerpc Author: None (RolandF77) ChangesUpdate cost model to reflect codegen change to use lfiwzx from #104507. Full diff: https://github.com/llvm/llvm-project/pull/108261.diff 2 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index b7bdbeb535d526..df0047022a2c04 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -802,12 +802,18 @@ InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
// explicitly check this case. There are also corresponding store
// instructions.
unsigned MemBytes = Src->getPrimitiveSizeInBits();
- if (ST->hasVSX() && IsAltivecType &&
- (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
- return 1;
+ Align AlignBytes = Alignment ? *Alignment : Align(1);
+ unsigned SrcBytes = LT.second.getStoreSize();
+ if (ST->hasVSX() && IsAltivecType) {
+ if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
+ return 1;
+ // Use lfiwax/xxspltw
+ if (Opcode == Instruction::Load && MemBytes == 32)
+ if (AlignBytes < SrcBytes || Cost > 2)
+ return 2;
+ }
// Aligned loads and stores are easy.
- unsigned SrcBytes = LT.second.getStoreSize();
if (!SrcBytes || !Alignment || *Alignment >= SrcBytes)
return Cost;
diff --git a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
index 54cafa0ae59f39..0e7e89c18c1cba 100644
--- a/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
+++ b/llvm/test/Analysis/CostModel/PowerPC/vsr_load_32_64.ll
@@ -1,18 +1,21 @@
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s
+; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck --check-prefix=P7 %s
target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"
define i32 @loads(i32 %arg) {
; CHECK: cost of 1 {{.*}} load
+ ; P7: cost of 2 {{.*}} load
load <4 x i8>, ptr undef, align 1
- ; CHECK: cost of 1 {{.*}} load
+ ; CHECK, P7: cost of 1 {{.*}} load
load <8 x i8>, ptr undef, align 1
; CHECK: cost of 1 {{.*}} load
+ ; P7: cost of 2 {{.*}} load
load <2 x i16>, ptr undef, align 2
- ; CHECK: cost of 1 {{.*}} load
+ ; CHECK, P7: cost of 1 {{.*}} load
load <4 x i16>, ptr undef, align 2
ret i32 undef
|
return 1; | ||
// Use lfiwax/xxspltw | ||
if (Opcode == Instruction::Load && MemBytes == 32) | ||
if (AlignBytes < SrcBytes || Cost > 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic if(Cost > 2) return 2; seems not perfect. Can we exclude the case at the place where Cost is set to be bigger than 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cost check removed. It was left over from an earlier version of the change.
if (ST->hasVSX() && IsAltivecType && | ||
(MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))) | ||
return 1; | ||
Align AlignBytes = Alignment ? *Alignment : Align(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can put the definition of the variable before the
if (Opcode == Instruction::Load && MemBytes == 32 && AlignBytes < SrcBytes)
)
target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64" | ||
target triple = "powerpc64-unknown-linux-gnu" | ||
|
||
define i32 @loads(i32 %arg) { | ||
; CHECK: cost of 1 {{.*}} load | ||
; P7: cost of 2 {{.*}} load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change to
opt < %s -passes="print" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 -mattr=+vsx | FileCheck %s -DCOST=1
opt < %s -passes="print" 2>&1 -disable-output -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx | FileCheck %s -DCOST=2
add change
; CHECK: cost of 1 {{.*}} load
; P7: cost of 2 {{.*}} load
to
CHECK: cost of [[COST]] {{.*}} load
it is only a suggestion, feel free to keep it if you do not want to modify
return 1; | ||
unsigned SrcBytes = LT.second.getStoreSize(); | ||
if (ST->hasVSX() && IsAltivecType) { | ||
if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this maybe not related to the patch,
I think the variable MemBytes
should be MemBits
, otherwise it maybe cause mis_understand (it looks like compare with 64bytes and 32bytes)
if (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32))
I think we can modify the variable name in the patch by the way?
|
||
// Use lfiwax/xxspltw | ||
Align AlignBytes = Alignment ? *Alignment : Align(1); | ||
if (Opcode == Instruction::Load && MemBytes == 32 && AlignBytes < SrcBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious that why need the AlignBytes < SrcBytes
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a partial vector (< 128 bits) is being loaded with a full vector aligned address (>= 128 bits), the load will be done as a full vector load since we know from alignment that it is safe. Therefore the cost of a partial vector load does not apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have further comment on it. thanks for addressing the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall LGTM. I have one question I was wondering about.
Is it more accurate to update the title so that it doesn't mention v4i8 specifically? For example, doesn't v2i16 also count?
Update cost model to reflect codegen change to use lfiwzx for 32-bit partial vector loads on pwr7 with llvm#104507.
Update cost model to reflect codegen change to use lfiwzx from #104507.